Skip to content

fix(server): avoid extracting text range filters#3034

Open
LegendPei wants to merge 11 commits into
apache:masterfrom
LegendPei:fix/fix-text-range-filter
Open

fix(server): avoid extracting text range filters#3034
LegendPei wants to merge 11 commits into
apache:masterfrom
LegendPei:fix/fix-text-range-filter

Conversation

@LegendPei
Copy link
Copy Markdown

@LegendPei LegendPei commented May 21, 2026

Purpose of the PR

This PR fixes the query-planning path exposed by #2935.

For top-level traversals like g.V().has("vp4", P.lt("")).repeat(...).count(),
HugeGraph may extract the text range predicate into a backend query. However,
range predicates on text properties should not be planned as backend range
index queries. This can make the direct traversal fail while an equivalent
match() traversal returns normally.

This change keeps such text range predicates in the traversal layer instead of
pushing them down to the backend query, making the direct traversal consistent
with the equivalent match() form.

Main Changes

  • Avoid extracting Text property predicates with gt/gte/lt/lte into
    HugeGraphStep / HugeVertexStep backend queries.
  • Preserve normal extraction for system properties and non-text property
    predicates.
  • Add a regression test for the reported traversal shape:
    has("vp4", P.lt("")).repeat(__.out("el2")).emit().times(1).count().
  • Verify that the direct traversal returns the same result as the equivalent
    match() traversal.

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    - mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -Dtest=CountStrategyCoreTest#testRepeatAfterTextRangeFilterWithEmptyResult -DfailIfNoTests=false "-Drat.skip=true" "-Dcheckstyle.skip=true" test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=CountStrategyCoreTest#testRepeatAfterTextRangeFilterWithEmptyResult -DfailIfNoTests=false "-Drat.skip=true" "-Dcheckstyle.skip=true" test

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels May 21, 2026
@imbajin imbajin requested a review from Copilot May 21, 2026 06:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a traversal query-planning bug where range predicates (gt/gte/lt/lte) on Text properties were being extracted/pushed down into backend queries, leading to incorrect behavior (and the exception reported in #2935). The change keeps these text range predicates in the traversal layer while adding a regression test to ensure direct traversals behave consistently with equivalent match() traversals.

Changes:

  • Prevent extraction of HasStep containers when they include range comparisons on Text properties.
  • Keep existing extraction behavior for system properties and non-text property predicates (intended).
  • Add a regression test reproducing the traversal shape from #2935 and asserting consistent results between direct and match() forms.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java Adds a guard to stop extracting HasContainers with text range predicates into backend query steps.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/core/CountStrategyCoreTest.java Adds a regression test covering a text range filter followed by repeat(...).emit().times(1).count().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LegendPei added 2 commits May 21, 2026 14:56
…fix/fix-text-range-filter

# Conflicts:
#	hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/traversal/optimize/TraversalUtil.java
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.36%. Comparing base (454dd3d) to head (0b45ebb).

Files with missing lines Patch % Lines
...he/hugegraph/traversal/optimize/TraversalUtil.java 88.23% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3034      +/-   ##
============================================
- Coverage     35.94%   31.36%   -4.58%     
- Complexity      338      500     +162     
============================================
  Files           803      814      +11     
  Lines         68053    69240    +1187     
  Branches       8907     9113     +206     
============================================
- Hits          24460    21720    -2740     
- Misses        40968    45118    +4150     
+ Partials       2625     2402     -223     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels May 21, 2026
@LegendPei LegendPei requested a review from Copilot May 21, 2026 07:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

@LegendPei LegendPei requested a review from Copilot May 21, 2026 11:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

@LegendPei LegendPei requested a review from Copilot May 21, 2026 12:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

@LegendPei LegendPei requested a review from Copilot May 21, 2026 12:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

@LegendPei LegendPei requested a review from Copilot May 21, 2026 13:15
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original correctness fix is clear, but the latest version now broadens the PR into a more aggressive HasStep split-and-remove optimization. I left one scope/risk comment suggesting we keep this PR on the minimal safe path and move the broader optimization to a follow-up.

if (!GraphStep.processHasContainerIds(newStep, has)) {
newStep.addHasContainer(has);
}
holder.removeHasContainer(has);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ I think we should scope this PR back down before merging. The original bug is a focused correctness issue: text range predicates such as has("vp4", P.lt("")) should not be pushed into the backend query path. The latest implementation now goes further and splits a mixed HasStep, pushes selected containers down, removes them from the original HasStep, and adds new label/index-availability logic in TraversalUtil.

Original bug boundary

Text range predicate
        |
        v
wrongly pushed to backend
        |
        v
exception / mismatch with match()
Latest implementation boundary

HasStep(vp4 < "", age = 1)
        |
        +-- push age = 1 into HugeGraphStep / HugeVertexStep
        |
        +-- remove age = 1 from the original HasStep
        |
        +-- rely on new label/index checks to prove backend can enforce it

That is a much broader optimizer change than the original fix, and once a container is removed from the original HasStep, correctness depends on canExtractHasContainer() exactly matching what the backend planner can really execute. The new tests cover several cases, but this still expands the review surface into composite-index matching, label context, paging/sysprop interactions, and future index-planner behavior.

For this PR, I would prefer one of these smaller and safer shapes:

Option 1: minimal correctness fix
  - keep unsafe Text + range predicates in the traversal layer
  - do not split mixed HasSteps
  - avoid adding backend index-availability logic here

Option 2: low-risk partial prefiltering
  - optionally push safe containers down as redundant backend pre-filters
  - keep the original HasStep unchanged unless all containers are extractable
  - correctness remains guarded by the full traversal-layer filter
Safer partial prefiltering shape

Backend prefilter:
  age = 1

Traversal filter still intact:
  vp4 < "" AND age = 1

Result:
  backend may reduce candidates, but traversal still owns correctness

The current split-and-remove optimization can still be valuable, but I think it should be a separate follow-up PR with dedicated tests for composite-index prefix matching, label-less queries, ~page / paging semantics when some user filters remain in traversal, and parity with the backend index planner. Keeping this PR minimal would make the #2935 fix easier to reason about and safer to merge.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sorry. I was too obsessed with fixing the review issue of copilot earlier, which led to a slight deviation in direction. I'll adjust my focus now.

Copy link
Copy Markdown
Member

@imbajin imbajin May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adjusting the direction. To clarify my previous comment: I do not mean that we should choose freely between the two shapes. For this PR, I would prefer the minimal option A.

The original issue is small and focused: Text range predicates should not be extracted into the backend query path. So the safest fix here is to add the narrow condition that blocks only that unsafe case, while keeping the existing extraction behavior unchanged for the rest.

Preferred shape for this PR

HasStep(vp4 < "", age = 1)
        |
        v
if any extracted container is Text + range:
  keep this HasStep in traversal layer
  avoid backend text-range extraction

No new index-planner logic
No mixed HasStep split/remove behavior
No broader optimizer contract change

This may leave a small performance loss for the uncommon case where one has() step mixes a text-range predicate with other safe predicates, but it keeps the patch proportional to the bug and avoids introducing new correctness risks around label context, index matching, composite index prefixes, paging, or unique/search/numeric index behavior.

PS: the more aggressive split-and-remove optimization can still be considered later

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. I agree with option A for this PR.

I have scoped the local changes back to the minimal correctness fix:

  • no mixed HasStep split/remove behavior
  • no new index-planner or label/index availability logic
  • if a HasStep contains a Text range predicate (gt/gte/lt/lte), keep the whole HasStep in the traversal layer
  • keep the existing extraction behavior for other cases as much as possible

This intentionally accepts the small potential performance loss for mixed HasSteps, so the fix stays proportional to #2935. I’ll push the narrowed patch after one final local check.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this fix adopts a conservative strategy. When encountering the same HasStep mixed with text range and other security conditions, the entire HasStep will be retained, possibly with less backend push-down optimization

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Unexpected ArrayIndexOutOfBoundsException for filter-step and repeat()-step

3 participants